智能论文笔记

Prediction of new outlinks for focused crawling

Thi Kim Nhung Dang , Doina Bucur , Berk Atil , Guillaume Pitel , Frank Ruis , Hamidreza Kadkhodaei , Nelly Litvak

分类：机器学习

2021-11-09

发现新的超链接使Web爬网程序能够找到尚未索引的新页面。这对于集中的爬行者来说尤为重要，因为他们努力提供对网络的特定部分的全面分析，从而优先考虑发现内容的变化的新页面。在文献中，通常同时考虑超链接和内容的变化。但是，还有证据表明这两种改变不一定是相关的。此外，关于预测变化的许多研究假设页面的长期可用，这在实践中是无法实现的。这项工作的目的是提供一种方法来使用短历史有效地检测新的链接。为此，我们使用一周的间隔使用十个爬网的数据集。我们的研究包括三个部分。首先，我们通过分析新的倒出数量的经验属性来获得数据的洞察力。我们观察到这些属性平均随着时间的推移稳定，但在目标页面内外页面的超链接出现的超链接之间存在很大的差异（分别分别是内部和外部倒降）。接下来，我们为三个目标提供统计模型：链路变化率，新链接的存在以及新链接的数量。这些模型包括文献中早些时候使用的功能，以及在这项工作中引入的新功能。我们分析了特征之间的相关性，并调查了他们的信息。一个值得注意的发现是，如果目标页面的历史不可用，那么我们的新功能，代表相关页面的历史，对于目标页面中的新链接最预测。最后，我们将排名方法作为聚焦爬虫的准则，以有效地发现新页面，这对相应的目标实现了出色的性能。

translated by 谷歌翻译

IMPaSh: A Novel Domain-shift Resistant Representation for Colorectal Cancer Tissue Classification

Trinh Thi Le Vuong , Quoc Dang Vu , Mostafa Jahanifar , Simon Graham , Jin Tae Kwak , Nasir Rajpoot

分类：计算机视觉

2022-08-23

组织病理学图像的出现取决于组织类型，染色和数字化过程。这些因素因来源而异，是域转移问题的潜在原因。由于这个问题，尽管深度学习模型在计算病理学中取得了巨大的成功，但在特定领域训练的模型当我们将其应用于另一个领域时，仍可能会表现出色。为了克服这一点，我们提出了一种称为PatchShuffling的新扩展，并为预训练的深度学习模型而被称为Impash的新型自我监视的对比学习框架。使用这些，我们获得了一个RESNET50编码器，该编码器可以提取对域移位抗性的图像表示。我们通过使用其他域普通化技术来比较了我们的派生表示形式，它们通过将它们用于结直肠组织图像的跨域分类。我们表明，所提出的方法优于其他传统的组织学领域适应和最先进的自我监督学习方法。代码可在以下网址获得：https：//github.com/trinhvg/impash。

translated by 谷歌翻译

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Ren Yang , Radu Timofte , Jing Liu , Yi Xu , Xinjian Zhang , Minyi Zhao , Shuigeng Zhou , Kelvin C. K. Chan , Shangchen Zhou , Xiangyu Xu

分类：计算机视觉

2021-04-21

本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战，重点是拟议的方法和结果。在此挑战中，采用了新的大型不同视频（LDV）数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频，而Track 3旨在增强X265压缩的视频，以固定的位速率压缩。此外，轨道1和3的质量提高了提高保真度（PSNR）的目标，以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段，分别提交了12个团队，8支球队和11支球队，分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页：https：//github.com/renyang-home/ntire21_venh

translated by 谷歌翻译

Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data

Hien Dang , Tan Nguyen , Tho Tran , Hung Tran , Nhat Ho

分类：机器学习 | (统计)机器学习

2023-01-01

Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.

translated by 谷歌翻译

Deployment of UAVs for Optimal Multihop Ad-hoc Networks Using Particle Swarm Optimization and Behavior-based Control

Ngan Duong Thi Thuy , Duy Nam Bui , Manh Duong Phung , Hung Pham Duy

分类：机器人

2022-12-26

This study proposes an approach for establishing an optimal multihop ad-hoc network using multiple unmanned aerial vehicles (UAVs) to provide emergency communication in disaster areas. The approach includes two stages, one uses particle swarm optimization (PSO) to find optimal positions to deploy UAVs, and the other uses a behavior-based controller to navigate the UAVs to their assigned positions without colliding with obstacles in an unknown environment. Several constraints related to the UAVs' sensing and communication ranges have been imposed to ensure the applicability of the proposed approach in real-world scenarios. A number of simulation experiments with data loaded from real environments have been conducted. The results show that our proposed approach is not only successful in establishing multihop ad-hoc routes but also meets the requirements for real-time deployment of UAVs.

translated by 谷歌翻译

Fixed-budget online adaptive mesh learning for physics-informed neural networks. Towards parameterized problem inference

Thi Nguyen Khoa Nguyen , Thibault Dairay , Raphaël Meunier , Christophe Millet , Mathilde Mougeot

分类：机器学习

2022-12-22

Physics-Informed Neural Networks (PINNs) have gained much attention in various fields of engineering thanks to their capability of incorporating physical laws into the models. PINNs integrate the physical constraints by minimizing the partial differential equations (PDEs) residuals on a set of collocation points. The distribution of these collocation points appears to have a huge impact on the performance of PINNs and the assessment of the sampling methods for these points is still an active topic. In this paper, we propose a Fixed-Budget Online Adaptive Mesh Learning (FBOAML) method, which decomposes the domain into sub-domains, for training collocation points based on local maxima and local minima of the PDEs residuals. The stopping criterion is based on a data set of reference, which leads to an adaptive number of iterations for each specific problem. The effectiveness of FBOAML is demonstrated in the context of non-parameterized and parameterized problems. The impact of the hyper-parameters in FBOAML is investigated in this work. The comparison with other adaptive sampling methods is also illustrated. The numerical results demonstrate important gains in terms of accuracy of PINNs with FBOAML over the classical PINNs with non-adaptive collocation points. We also apply FBOAML in a complex industrial application involving coupling between mechanical and thermal fields. We show that FBOAML is able to identify the high-gradient location and even give better prediction for some physical fields than the classical PINNs with collocation points taken on a pre-adapted finite element mesh.

translated by 谷歌翻译

Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation

Yizhou Dang , Enneng Yang , Guibing Guo , Linying Jiang , Xingwei Wang , Xiaoxiao Xu , Qinghui Sun , Hong Liu

分类：机器学习

2022-12-16

Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ineffectiveness of user modeling due to the issue of \emph{preference drift}. In fact, we conducted an empirical study to validate this observation, and found that a sequence with uniformly distributed time interval (denoted as uniform sequence) is more beneficial for performance improvement than that with greatly varying time interval. Therefore, we propose to augment sequence data from the perspective of time interval, which is not studied in the literature. Specifically, we design five operators (Ti-Crop, Ti-Reorder, Ti-Mask, Ti-Substitute, Ti-Insert) to transform the original non-uniform sequence to uniform sequence with the consideration of variance of time intervals. Then, we devise a control strategy to execute data augmentation on item sequences in different lengths. Finally, we implement these improvements on a state-of-the-art model CoSeRec and validate our approach on four real datasets. The experimental results show that our approach reaches significantly better performance than the other 11 competing methods. Our implementation is available: https://github.com/KingGugu/TiCoSeRec.

translated by 谷歌翻译

Ensembling Transformers for Cross-domain Automatic Term Extraction

Hanh Thi Hong Tran , Matej Martinc , Andraz Pelicon , Antoine Doucet , Senja Pollak

分类：自然语言处理

2022-12-12

Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ability of monolingual models to extract single- and multi-word terms, we also experiment with ensembles of mono- and multilingual models by conducting the intersection or union on the term output sets of different language models. Our experiments have been conducted on the ACTER corpus covering four specialized domains (Corruption, Wind energy, Equitation, and Heart failure) and three languages (English, French, and Dutch), and on the RSDO5 Slovenian corpus covering four additional domains (Biomechanics, Chemistry, Veterinary, and Linguistics). The results show that the strategy of employing monolingual models outperforms the state-of-the-art approaches from the related work leveraging multilingual models, regarding all the languages except Dutch and French if the term extraction task excludes the extraction of named entity terms. Furthermore, by combining the outputs of the two best performing models, we achieve significant improvements.

translated by 谷歌翻译

AUC Maximization for Low-Resource Named Entity Recognition

Ngoc Dang Nguyen , Wei Tan , Wray Buntine , Richard Beare , Changyou Chen , Lan Du

分类：自然语言处理 | 机器学习

2022-12-09

Current work in named entity recognition (NER) uses either cross entropy (CE) or conditional random fields (CRF) as the objective/loss functions to optimize the underlying NER model. Both of these traditional objective functions for the NER problem generally produce adequate performance when the data distribution is balanced and there are sufficient annotated training examples. But since NER is inherently an imbalanced tagging problem, the model performance under the low-resource settings could suffer using these standard objective functions. Based on recent advances in area under the ROC curve (AUC) maximization, we propose to optimize the NER model by maximizing the AUC score. We give evidence that by simply combining two binary-classifiers that maximize the AUC score, significant performance improvement over traditional loss functions is achieved under low-resource NER settings. We also conduct extensive experiments to demonstrate the advantages of our method under the low-resource and highly-imbalanced data distribution settings. To the best of our knowledge, this is the first work that brings AUC maximization to the NER setting. Furthermore, we show that our method is agnostic to different types of NER embeddings, models and domains. The code to replicate this work will be provided upon request.

translated by 谷歌翻译

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Jinze Bai , Rui Men , Hao Yang , Xuancheng Ren , Kai Dang , Yichang Zhang , Xiaohuan Zhou , Peng Wang , Sinan Tan , An Yang

分类：计算机视觉 | 人工智能 | 自然语言处理 | 机器学习

2022-12-08

Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys

translated by 谷歌翻译